Goto

Collaborating Authors

 alphageometry 2


Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Chen, Luoxin, Gu, Jinming, Huang, Liankai, Huang, Wenhao, Jiang, Zhicheng, Jie, Allan, Jin, Xiaoran, Jin, Xing, Li, Chenggang, Ma, Kaijing, Ren, Cheng, Shen, Jiawei, Shi, Wenlei, Sun, Tong, Sun, He, Wang, Jiahui, Wang, Siran, Wang, Zhihong, Wei, Chenrui, Wei, Shufa, Wu, Yonghui, Wu, Yuchen, Xia, Yihang, Xin, Huajian, Yang, Fan, Ying, Huaiyuan, Yuan, Hongyi, Yuan, Zheng, Zhan, Tianyang, Zhang, Chi, Zhang, Yue, Zhang, Ge, Zhao, Tianyun, Zhao, Jianqiu, Zhou, Yichi, Zhu, Thomas Hanwen

arXiv.org Artificial Intelligence

LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose \textbf{Seed-Prover}, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves $78.1\%$ of formalized past IMO problems, saturates MiniF2F, and achieves over 50\% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine \textbf{Seed-Geometry}, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.


Google DeepMind takes step closer to cracking top-level maths

The Guardian

Even though computers were made to do maths faster than any human could manage, the top level of formal mathematics remains an exclusively human domain. But a breakthrough by researchers at Google DeepMind has brought AI systems closer than ever to beating the best human mathematicians at their own game. A pair of new systems, called AlphaProof and AlphaGeometry 2, worked together to tackle questions from the International Mathematical Olympiad, a global maths competition for secondary-school students that has been running since 1959. The Olympiad takes the form of six mind-bogglingly hard questions each year, covering fields including algebra, geometry and number theory. The combined efforts of DeepMind's two systems weren't quite in that league.